Semantically Motivated Improvements for PPM Variants
نویسنده
چکیده
The on-line sequence modelling algorithm `Prediction by Partial Matching' (PPM) has set the performance standard in lossless data compression research since Moffat's 1990 implementation, PPMC. Despite intense research activity, only Howard's 1993 escape-count update mechanism `D' has provided any consistent, order-independent performance improvement to PPMC (about 1%). Most notably, the recently introduced PPM variant, PPM*, which eliminates PPM's order bound, fails to offer compression results superior to those of PPMC with Markov order greater than four. This paper explains how to signi"cantly improve the compression performance of any PPM variant (by 512%) by combining PPM's probability estimator, `blending', with information-theoretic state selection. Hazards inherent to this combination are overcome by identifying the distinct semantics of the two approaches and resolving the differences using a dual-frequency update mechanism. We present and apply our percolating state selector, plus an enhancement to blending, both of which we have recently shown to independently outperform all competing techniques from the literature. We also give a minimal linear-space suf"x-tree implementation of PPM and PPM*. Performance is measured in experiments run on the Calgary Corpus using our reimplementation of the original algorithms in an executable cross-product of independent model components, which permits precise control of all modelling algorithm features.
منابع مشابه
Linguistically Motivated Descriptive Term Selection
A linguistically motivated approach to indexing, that is the provision of descriptive terms for texts of any kind, is presented and illustrated. The approach is designed to achieve good, i.e. accurate and flexible, indexing by identifying index term sources in the meaning representations built by a powerful general purpose analyser, and providing a range of text expressions constituting semanti...
متن کاملAn Empirical Comparison of the Performance of PPM Variants on a Prediction Task with Monophonic Music
N-gram models have been employed for a number of musical tasks including the development of practical applications providing computational support for creative individuals as well as theoretical studies of creative processes. Our goal in this research is to evaluate, in an application independent manner, some recent techniques for improving the performance on monophonic music of a subclass of s...
متن کاملPerformance Improvements of a Centrifugal Pump with Different Impellers using Polymer Additive
In this study, the performance of a centrifugal pump is investigated by adding polyacrylamide (PAM) polymer over the working fluid which is tap water in this case. PAM is a long chain polymer that leads to reduce the wall shear stress and drag in a turbulent fluid. Three different blade profiles including radial, straight backward and circular backward have been examined. For this purpose, a ce...
متن کاملThe Role of Arg13 in Protein Phosphatase M tPphA from Thermosynechococcus elongatus
A highly conserved arginine residue is close to the catalytic center of PPM/PP2C-type protein phosphatases. Different crystal structures of PPM/PP2C homologues revealed that the guanidinium side chain of this arginine residue can adopt variable conformations and may bind ligands, suggesting an important role of this residue during catalysis. In this paper, we randomly mutated Arginine 13 of tPp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. J.
دوره 40 شماره
صفحات -
تاریخ انتشار 1997